Overview

Dataset statistics

Number of variables10
Number of observations40000
Missing cells9014
Missing cells (%)2.3%
Duplicate rows11
Duplicate rows (%)< 0.1%
Total size in memory3.1 MiB
Average record size in memory80.0 B

Variable types

Numeric10

Alerts

Dataset has 11 (< 0.1%) duplicate rowsDuplicates
vezes_passou_de_30_59_dias is highly correlated with numero_vezes_passou_90_dias and 1 other fieldsHigh correlation
numero_vezes_passou_90_dias is highly correlated with vezes_passou_de_30_59_dias and 1 other fieldsHigh correlation
numero_de_vezes_que_passou_60_89_dias is highly correlated with vezes_passou_de_30_59_dias and 1 other fieldsHigh correlation
vezes_passou_de_30_59_dias is highly correlated with numero_vezes_passou_90_dias and 1 other fieldsHigh correlation
numero_vezes_passou_90_dias is highly correlated with vezes_passou_de_30_59_dias and 1 other fieldsHigh correlation
numero_de_vezes_que_passou_60_89_dias is highly correlated with vezes_passou_de_30_59_dias and 1 other fieldsHigh correlation
salario_mensal has 7968 (19.9%) missing values Missing
numero_de_dependentes has 1046 (2.6%) missing values Missing
util_linhas_inseguras is highly skewed (γ1 = 61.17841833) Skewed
vezes_passou_de_30_59_dias is highly skewed (γ1 = 23.29605555) Skewed
razao_debito is highly skewed (γ1 = 92.00432638) Skewed
salario_mensal is highly skewed (γ1 = 72.34132664) Skewed
numero_vezes_passou_90_dias is highly skewed (γ1 = 23.82046421) Skewed
numero_de_vezes_que_passou_60_89_dias is highly skewed (γ1 = 24.12412168) Skewed
util_linhas_inseguras has 2895 (7.2%) zeros Zeros
vezes_passou_de_30_59_dias has 33549 (83.9%) zeros Zeros
razao_debito has 1083 (2.7%) zeros Zeros
salario_mensal has 418 (1.0%) zeros Zeros
numero_linhas_crdto_aberto has 469 (1.2%) zeros Zeros
numero_vezes_passou_90_dias has 37826 (94.6%) zeros Zeros
numero_emprestimos_imobiliarios has 15029 (37.6%) zeros Zeros
numero_de_vezes_que_passou_60_89_dias has 37930 (94.8%) zeros Zeros
numero_de_dependentes has 23250 (58.1%) zeros Zeros

Reproduction

Analysis started2022-05-06 13:54:02.126132
Analysis finished2022-05-06 13:54:28.372813
Duration26.25 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

util_linhas_inseguras
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct34211
Distinct (%)85.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.374199262
Minimum0
Maximum22000
Zeros2895
Zeros (%)7.2%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:28.598461image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.02936178475
median0.1494907605
Q30.5497619695
95-th percentile0.9999999
Maximum22000
Range22000
Interquartile range (IQR)0.5204001848

Descriptive statistics

Standard deviation242.6172471
Coefficient of variation (CV)38.06238826
Kurtosis4476.926731
Mean6.374199262
Median Absolute Deviation (MAD)0.1441174735
Skewness61.17841833
Sum254967.9705
Variance58863.12861
MonotonicityNot monotonic
2022-05-06T10:54:28.796823image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02895
 
7.2%
0.99999992671
 
6.7%
15
 
< 0.1%
1.0033222594
 
< 0.1%
0.8300339933
 
< 0.1%
0.95009983
 
< 0.1%
0.008398323
 
< 0.1%
0.9866026793
 
< 0.1%
0.0049993
 
< 0.1%
0.04990023
 
< 0.1%
Other values (34201)34407
86.0%
ValueCountFrequency (%)
02895
7.2%
8.37 × 10-61
 
< 0.1%
1.25 × 10-51
 
< 0.1%
1.43 × 10-51
 
< 0.1%
1.51 × 10-51
 
< 0.1%
1.88 × 10-51
 
< 0.1%
2.1 × 10-51
 
< 0.1%
2.66 × 10-51
 
< 0.1%
2.85 × 10-51
 
< 0.1%
2.86 × 10-51
 
< 0.1%
ValueCountFrequency (%)
220001
< 0.1%
205141
< 0.1%
183001
< 0.1%
139301
< 0.1%
123691
< 0.1%
101511
< 0.1%
87101
< 0.1%
83281
< 0.1%
75551
< 0.1%
74521
< 0.1%

idade
Real number (ℝ≥0)

Distinct81
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.404025
Minimum21
Maximum109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:29.003419image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile29
Q141
median52
Q363
95-th percentile78
Maximum109
Range88
Interquartile range (IQR)22

Descriptive statistics

Standard deviation14.78146816
Coefficient of variation (CV)0.2820674207
Kurtosis-0.4877450013
Mean52.404025
Median Absolute Deviation (MAD)11
Skewness0.1925811216
Sum2096161
Variance218.4918011
MonotonicityNot monotonic
2022-05-06T10:54:29.206952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
491049
 
2.6%
501044
 
2.6%
631014
 
2.5%
481012
 
2.5%
471001
 
2.5%
53988
 
2.5%
56987
 
2.5%
45977
 
2.4%
62969
 
2.4%
52967
 
2.4%
Other values (71)29992
75.0%
ValueCountFrequency (%)
2141
 
0.1%
22110
 
0.3%
23172
 
0.4%
24200
 
0.5%
25249
0.6%
26319
0.8%
27369
0.9%
28408
1.0%
29406
1.0%
30552
1.4%
ValueCountFrequency (%)
1091
 
< 0.1%
1031
 
< 0.1%
1011
 
< 0.1%
981
 
< 0.1%
976
 
< 0.1%
968
 
< 0.1%
9511
< 0.1%
9417
< 0.1%
9316
< 0.1%
9227
0.1%

vezes_passou_de_30_59_dias
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.412725
Minimum0
Maximum98
Zeros33549
Zeros (%)83.9%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:29.376579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.057899055
Coefficient of variation (CV)9.83196815
Kurtosis556.7069732
Mean0.412725
Median Absolute Deviation (MAD)0
Skewness23.29605555
Sum16509
Variance16.46654474
MonotonicityNot monotonic
2022-05-06T10:54:29.524763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
033549
83.9%
14307
 
10.8%
21255
 
3.1%
3450
 
1.1%
4206
 
0.5%
596
 
0.2%
9866
 
0.2%
642
 
0.1%
713
 
< 0.1%
88
 
< 0.1%
Other values (4)8
 
< 0.1%
ValueCountFrequency (%)
033549
83.9%
14307
 
10.8%
21255
 
3.1%
3450
 
1.1%
4206
 
0.5%
596
 
0.2%
642
 
0.1%
713
 
< 0.1%
88
 
< 0.1%
94
 
< 0.1%
ValueCountFrequency (%)
9866
 
0.2%
961
 
< 0.1%
111
 
< 0.1%
102
 
< 0.1%
94
 
< 0.1%
88
 
< 0.1%
713
 
< 0.1%
642
 
0.1%
596
0.2%
4206
0.5%

razao_debito
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct33792
Distinct (%)84.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean352.1175981
Minimum0
Maximum307001
Zeros1083
Zeros (%)2.7%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:29.710062image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.004553452
Q10.175412294
median0.366222329
Q30.8746777677
95-th percentile2410
Maximum307001
Range307001
Interquartile range (IQR)0.6992654737

Descriptive statistics

Standard deviation2084.204327
Coefficient of variation (CV)5.919057549
Kurtosis12514.81439
Mean352.1175981
Median Absolute Deviation (MAD)0.245857968
Skewness92.00432638
Sum14084703.92
Variance4343907.677
MonotonicityNot monotonic
2022-05-06T10:54:29.910050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01083
 
2.7%
158
 
0.1%
257
 
0.1%
450
 
0.1%
941
 
0.1%
636
 
0.1%
534
 
0.1%
833
 
0.1%
1133
 
0.1%
332
 
0.1%
Other values (33782)38543
96.4%
ValueCountFrequency (%)
01083
2.7%
6.62 × 10-51
 
< 0.1%
0.0001063491
 
< 0.1%
0.0001094871
 
< 0.1%
0.0001202791
 
< 0.1%
0.000126041
 
< 0.1%
0.0001362211
 
< 0.1%
0.0001405281
 
< 0.1%
0.0001472981
 
< 0.1%
0.0001499931
 
< 0.1%
ValueCountFrequency (%)
3070011
< 0.1%
1553751
< 0.1%
602121
< 0.1%
491121
< 0.1%
367051
< 0.1%
347191
< 0.1%
341021
< 0.1%
302951
< 0.1%
245911
< 0.1%
213951
< 0.1%

salario_mensal
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct7879
Distinct (%)24.6%
Missing7968
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean6760.601836
Minimum0
Maximum1794060
Zeros418
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:30.125620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1300
Q13400
median5409.5
Q38284
95-th percentile14583.45
Maximum1794060
Range1794060
Interquartile range (IQR)4884

Descriptive statistics

Standard deviation16836.38658
Coefficient of variation (CV)2.490368015
Kurtosis6646.833242
Mean6760.601836
Median Absolute Deviation (MAD)2338
Skewness72.34132664
Sum216555598
Variance283463913
MonotonicityNot monotonic
2022-05-06T10:54:30.328966image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5000722
 
1.8%
4000501
 
1.3%
6000497
 
1.2%
3000467
 
1.2%
0418
 
1.0%
2500416
 
1.0%
10000398
 
1.0%
3500373
 
0.9%
4500333
 
0.8%
7000317
 
0.8%
Other values (7869)27590
69.0%
(Missing)7968
 
19.9%
ValueCountFrequency (%)
0418
1.0%
1172
0.4%
41
 
< 0.1%
71
 
< 0.1%
101
 
< 0.1%
151
 
< 0.1%
401
 
< 0.1%
831
 
< 0.1%
1008
 
< 0.1%
1201
 
< 0.1%
ValueCountFrequency (%)
17940601
< 0.1%
15601001
< 0.1%
8350401
< 0.1%
7304831
< 0.1%
6495871
< 0.1%
5624661
< 0.1%
4400001
< 0.1%
4282501
< 0.1%
2500001
< 0.1%
1500002
< 0.1%

numero_linhas_crdto_aberto
Real number (ℝ≥0)

ZEROS

Distinct52
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.472525
Minimum0
Maximum57
Zeros469
Zeros (%)1.2%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:30.538192image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median8
Q311
95-th percentile18
Maximum57
Range57
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.164960024
Coefficient of variation (CV)0.6096128396
Kurtosis3.309707064
Mean8.472525
Median Absolute Deviation (MAD)3
Skewness1.260674453
Sum338901
Variance26.67681204
MonotonicityNot monotonic
2022-05-06T10:54:30.728639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
63626
 
9.1%
73570
 
8.9%
53410
 
8.5%
83391
 
8.5%
93115
 
7.8%
43097
 
7.7%
102529
 
6.3%
32406
 
6.0%
112214
 
5.5%
121883
 
4.7%
Other values (42)10759
26.9%
ValueCountFrequency (%)
0469
 
1.2%
11186
 
3.0%
21751
4.4%
32406
6.0%
43097
7.7%
53410
8.5%
63626
9.1%
73570
8.9%
83391
8.5%
93115
7.8%
ValueCountFrequency (%)
571
 
< 0.1%
531
 
< 0.1%
521
 
< 0.1%
501
 
< 0.1%
492
< 0.1%
481
 
< 0.1%
471
 
< 0.1%
453
< 0.1%
434
< 0.1%
424
< 0.1%

numero_vezes_passou_90_dias
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.255025
Minimum0
Maximum98
Zeros37826
Zeros (%)94.6%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:30.906053image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.034326975
Coefficient of variation (CV)15.81933918
Kurtosis573.6285573
Mean0.255025
Median Absolute Deviation (MAD)0
Skewness23.82046421
Sum10201
Variance16.27579414
MonotonicityNot monotonic
2022-05-06T10:54:31.061387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
037826
94.6%
11357
 
3.4%
2395
 
1.0%
3181
 
0.5%
487
 
0.2%
9866
 
0.2%
528
 
0.1%
622
 
0.1%
717
 
< 0.1%
85
 
< 0.1%
Other values (8)16
 
< 0.1%
ValueCountFrequency (%)
037826
94.6%
11357
 
3.4%
2395
 
1.0%
3181
 
0.5%
487
 
0.2%
528
 
0.1%
622
 
0.1%
717
 
< 0.1%
85
 
< 0.1%
94
 
< 0.1%
ValueCountFrequency (%)
9866
0.2%
961
 
< 0.1%
151
 
< 0.1%
141
 
< 0.1%
132
 
< 0.1%
122
 
< 0.1%
113
 
< 0.1%
102
 
< 0.1%
94
 
< 0.1%
85
 
< 0.1%

numero_emprestimos_imobiliarios
Real number (ℝ≥0)

ZEROS

Distinct21
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0137
Minimum0
Maximum25
Zeros15029
Zeros (%)37.6%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:31.217750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum25
Range25
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.112494158
Coefficient of variation (CV)1.09745897
Kurtosis21.31902269
Mean1.0137
Median Absolute Deviation (MAD)1
Skewness2.577364186
Sum40548
Variance1.237643251
MonotonicityNot monotonic
2022-05-06T10:54:31.377984image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
015029
37.6%
113984
35.0%
28361
20.9%
31656
 
4.1%
4569
 
1.4%
5193
 
0.5%
683
 
0.2%
745
 
0.1%
826
 
0.1%
920
 
0.1%
Other values (11)34
 
0.1%
ValueCountFrequency (%)
015029
37.6%
113984
35.0%
28361
20.9%
31656
 
4.1%
4569
 
1.4%
5193
 
0.5%
683
 
0.2%
745
 
0.1%
826
 
0.1%
920
 
0.1%
ValueCountFrequency (%)
251
 
< 0.1%
201
 
< 0.1%
191
 
< 0.1%
181
 
< 0.1%
161
 
< 0.1%
152
 
< 0.1%
141
 
< 0.1%
132
 
< 0.1%
128
< 0.1%
118
< 0.1%

numero_de_vezes_que_passou_60_89_dias
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.23075
Minimum0
Maximum98
Zeros37930
Zeros (%)94.8%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:31.526832image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.017506434
Coefficient of variation (CV)17.41064543
Kurtosis583.8956885
Mean0.23075
Median Absolute Deviation (MAD)0
Skewness24.12412168
Sum9230
Variance16.14035795
MonotonicityNot monotonic
2022-05-06T10:54:31.669571image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
037930
94.8%
11562
 
3.9%
2293
 
0.7%
396
 
0.2%
9866
 
0.2%
438
 
0.1%
58
 
< 0.1%
64
 
< 0.1%
72
 
< 0.1%
961
 
< 0.1%
ValueCountFrequency (%)
037930
94.8%
11562
 
3.9%
2293
 
0.7%
396
 
0.2%
438
 
0.1%
58
 
< 0.1%
64
 
< 0.1%
72
 
< 0.1%
961
 
< 0.1%
9866
 
0.2%
ValueCountFrequency (%)
9866
 
0.2%
961
 
< 0.1%
72
 
< 0.1%
64
 
< 0.1%
58
 
< 0.1%
438
 
0.1%
396
 
0.2%
2293
 
0.7%
11562
 
3.9%
037930
94.8%

numero_de_dependentes
Real number (ℝ≥0)

MISSING
ZEROS

Distinct12
Distinct (%)< 0.1%
Missing1046
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean0.7565076757
Minimum0
Maximum13
Zeros23250
Zeros (%)58.1%
Negative0
Negative (%)0.0%
Memory size312.6 KiB
2022-05-06T10:54:31.809653image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum13
Range13
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.11624413
Coefficient of variation (CV)1.475522544
Kurtosis2.721377542
Mean0.7565076757
Median Absolute Deviation (MAD)0
Skewness1.579546639
Sum29469
Variance1.246000958
MonotonicityNot monotonic
2022-05-06T10:54:31.986796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
023250
58.1%
16900
 
17.2%
25216
 
13.0%
32585
 
6.5%
4751
 
1.9%
5183
 
0.5%
643
 
0.1%
713
 
< 0.1%
88
 
< 0.1%
93
 
< 0.1%
Other values (2)2
 
< 0.1%
(Missing)1046
 
2.6%
ValueCountFrequency (%)
023250
58.1%
16900
 
17.2%
25216
 
13.0%
32585
 
6.5%
4751
 
1.9%
5183
 
0.5%
643
 
0.1%
713
 
< 0.1%
88
 
< 0.1%
93
 
< 0.1%
ValueCountFrequency (%)
131
 
< 0.1%
101
 
< 0.1%
93
 
< 0.1%
88
 
< 0.1%
713
 
< 0.1%
643
 
0.1%
5183
 
0.5%
4751
 
1.9%
32585
6.5%
25216
13.0%

Interactions

2022-05-06T10:54:25.301315image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:07.118240image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:09.305446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:11.180704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:13.111847image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:15.302853image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:17.292339image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:19.239969image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:21.165931image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:23.080357image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:25.481603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:07.614287image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:09.487437image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:11.365919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:13.299689image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:15.489151image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:17.469033image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:19.425338image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:21.349125image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:23.270466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:25.658435image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:07.795375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:09.663018image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:11.552063image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:13.478734image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:15.694050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:17.654544image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:19.600835image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:21.538619image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:23.458978image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:25.847025image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:07.977397image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:09.850763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:11.748372image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:13.663532image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:15.884247image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:17.851401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:19.792189image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:21.720523image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:23.643655image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:26.040905image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:08.171345image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:10.035913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:11.943438image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:13.862811image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:16.092565image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:18.057433image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:19.988586image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:21.903603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:23.837031image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:26.244659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:08.367491image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:10.239697image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:12.148814image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:14.074414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:16.297774image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:18.270450image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:20.193885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:22.106447image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:24.042735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:26.431913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:08.549918image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:10.420797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:12.339440image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:14.273977image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:16.489999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:18.464638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:20.380283image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:22.306520image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:24.241796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:26.618425image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:08.731359image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:10.607094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:12.532185image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:14.470094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:16.679745image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:18.653397image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:20.567234image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:22.498237image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:24.439841image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:26.808615image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:08.921005image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:10.792576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:12.720991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:14.665709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:16.879458image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:18.846811image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:20.751752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:22.691492image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:24.918064image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:27.002684image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:09.114055image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:10.986253image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:12.912503image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:14.864023image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:17.081969image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:19.037598image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:20.943857image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:22.887817image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T10:54:25.107896image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-05-06T10:54:32.136582image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-06T10:54:32.386193image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-06T10:54:32.634675image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-06T10:54:32.885208image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-06T10:54:27.288601image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-06T10:54:27.642265image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-06T10:54:28.088450image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-06T10:54:28.213916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

util_linhas_insegurasidadevezes_passou_de_30_59_diasrazao_debitosalario_mensalnumero_linhas_crdto_abertonumero_vezes_passou_90_diasnumero_emprestimos_imobiliariosnumero_de_vezes_que_passou_60_89_diasnumero_de_dependentes
00.0258496200.0817758180.030200.0
10.6670835500.1531122200.030000.0
20.0070934400.1488007499.0200100.0
30.0912135400.3516355900.0150110.0
40.1126805400.0659592167.030000.0
50.3239854200.35215110415.080202.0
60.0073006300.0022894368.020000.0
70.0000007610.2616111700.040000.0
80.0622805500.40666715658.0141300.0
90.4798994110.2099038441.060300.0

Last rows

util_linhas_insegurasidadevezes_passou_de_30_59_diasrazao_debitosalario_mensalnumero_linhas_crdto_abertonumero_vezes_passou_90_diasnumero_emprestimos_imobiliariosnumero_de_vezes_que_passou_60_89_diasnumero_de_dependentes
399900.0726633800.25305710550.0160202.0
399910.4523445100.7663084000.0130200.0
399920.1617033100.8246422166.070200.0
399930.3020134700.3292385208.030200.0
399940.1079155900.3710857310.090202.0
399950.0000006810.06285810833.091000.0
399960.0611178400.0135988456.070000.0
399970.8177457000.6630566000.0170200.0
399980.1061396420.8654385045.0120400.0
399991.0000002800.0022853500.001000.0

Duplicate rows

Most frequently occurring

util_linhas_insegurasidadevezes_passou_de_30_59_diasrazao_debitosalario_mensalnumero_linhas_crdto_abertonumero_vezes_passou_90_diasnumero_emprestimos_imobiliariosnumero_de_vezes_que_passou_60_89_diasnumero_de_dependentes# duplicates
10.02200.0820.020000.03
30.02300.0898.020000.03
81.02200.0820.010000.03
00.02100.00.010000.02
20.02200.0929.020000.02
40.02400.00.010000.02
50.02400.0820.020000.02
60.02800.02500.020000.02
70.04000.03500.010000.02
91.02300.00.010000.02